Segmenting documents by stylistic character

نویسندگان

  • Neil Graham
  • Graeme Hirst
  • Bhaskara Marthi
چکیده

As part of a larger project to develop an aid for writers that would help to eliminate stylistic inconsistencies within a document, we experimented with neural networks to find the points in a text at which its stylistic character changes. Our best results, well above baseline, were achieved with time-delay networks that used features related to the author’s syntactic preferences, whereas low-level and vocabulary-based features were not found to be useful. An alternative approach with character bigrams was not successful.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Segmenting a document by stylistic character

As part of a larger project to develop an aid for writers that would help to eliminate stylistic inconsistencies within a document, we experimented with neural networks to find the points in a text at which its stylistic character changes. Our best results, well above baseline, were achieved with time-delay networks that used features related to the author’s syntactic preferences. Low-level and...

متن کامل

A Novel Approach of Segmenting Touching and Kerned Characters

Character segmentation is a critical step of OCR system. In this paper we discussed segmentation approaches of touching and kerned characters.A non-linear segmentation pathbased algorithm for segmenting touching and kerned characters is put forward. First, touching and kerned characters are extracted and segregated with other characters by using character projections and recognition results.The...

متن کامل

Style-Directed Document Recognition

We are developing a document recognition system that can be tunably optimized for performance on documents of specific styles. We interactively generate XML to encode specific knowledge about a class of documents to be input to a recognition system. The encoding includes attributes of document logical structure as well as layout structure constraints. The encoding of document style is used to a...

متن کامل

Extracting and Segmenting Container Name from Container Images

Container name extraction is very important to the modern container management system.Similar techniques have been suggested for vehicle license plate recognition in past decades.Container name extraction has more complexity from license plate extraction because of the severity of nonuniform illumination and invalidation of color information.The main purpose of this paper is to propose a new me...

متن کامل

Efficient Social Network Multilingual Classification using Character, POS n-grams and Dynamic Normalization

In this paper we describe a dynamic normalization process applied to social network multilingual documents (Facebook and Twitter) to improve the performance of the Author profiling task for short texts. After the normalization process, n-grams of characters and n-grams of POS tags are obtained to extract all the possible stylistic information encoded in the documents (emoticons, character flood...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Natural Language Engineering

دوره 11  شماره 

صفحات  -

تاریخ انتشار 2005